perm filename KRD2.MSS[PEG,DBL] blob sn#453056 filedate 1979-06-30 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00016 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00003 00002	.S(Background,|TEIRESIAS, computer consultants, production rules and some vocabulary|)
C00005 00003	.ind computer consultants
C00008 00004	.SSS(Mycin)
C00011 00005	.FULLPAGEFIGTURN ON "→"
C00014 00006	.FULLPAGEFIGTURN ON "→"
C00018 00007	.SKIP TO COLUMN 1 SS(Teiresias:# System organization)
C00023 00008	.beginind knowledge base
C00034 00009	.SSS(Performance program:# The inference engine,BKGNDCS:)
C00045 00010	.SSS(Domain independence and range of application)
C00051 00011	.SS(Production rules,PRODRULES:)
C00058 00012	.SSS(Production rules as a knowledge representation,)
C00063 00013	.ind knowledge organization
C00068 00014	.ind high-level language
C00073 00015		As indicated earlier in our discussion of the control structure, before
C00079 00016	.SS(Levels of knowledge,LEVELSOFK:)
C00089 ENDMK
C⊗;
.S(Background,|TEIRESIAS, computer consultants, production rules and some vocabulary|);
.BEGINSMALLQUOTE;TURN ON "→";
→It vexes me what ails him.
.ENDSMALLQUOTE(|%2Oedipus the King%*, line 74|);
.SS(Introduction);
	The first part of this chapter provides a brief overview of && and the
sort of performance programs it is designed to help build.  We describe the
knowledge representations used, review the control structure, and
introduce several concepts and terms that will be useful vocabulary in
subsequent chapters.
	{YON1 PRODRULES} explores some general ideas about production
rules, showing how they have been used to develop what amounts to a high-level
language and indicating how this language forms the basis for many of
the capabilities discussed later.
	{YON1 LEVELSOFK} considers briefly the problem of high performance vs.
generality and examines the work we have done in that light, considering in
particular the benefits of a hierarchical layering of types of knowledge.

.ind computer consultants
.SS(Computer consultants);
	The recent growth of interest in the class of programs known as
computer consultants can be seen as a logical consequence of the two trends
noted in chapter 1--an emphasis on large stores of domain-specific
knowledge and the concentration on problems taken from real world settings.
These programs are intended to provide expert-level advice on a difficult
.ind expertise
cognitive problem, perhaps one for which human expertise is in short
supply.∪∪The concept is defined here in terms that assume the existence of
human experts to insure that knowledge in the domain is sufficiently
advanced that it can reliably support high performance.  Human experts also
supply one of the best sources for this knowledge (learning from textbooks
has also been considered) and offer a standard by which to gauge program
performance.  The definition includes only cognitive tasks, to minimize
complicating factors that result from the reliance of human performance on
highly specialized processors (as appears to be true for vision and
manipulation).∪
	Several programs of this type are currently under development.  A
program for diagnosis of glaucoma is described in [[Kulikowski73]; internal
medicine is the domain for another effort described by [[Pople75]; and work
on a consultation program for electro-mechanical repair tasks is reported
in [[Hart75].  These programs all rely on large stores of domain-specific
knowledge for their performance, and thus could be considered candidates
for the kind of performance program && has been designed to help construct.
.XGENLINES←XGENLINES-3; comment page number falls off due to fn and chapter title;

.SSS(Mycin);
	One such program, the {SYSTM MYCIN} system ([[Shortliffe76],
[[Davis77b]), provided the context in which && was developed and played the
role of the performance program in {YONFIG OVERLAY}.  {SYSTM MYCIN} was
designed to provide consultative advice on diagnosis and therapy for
infectious diseases.  Such advice is often required in the hospital because
the attending physician may not be an expert on infectious disease--as, for
example, when a cardiology patient develops an infection after heart
surgery. Time considerations compound the problem. A specimen (blood,
urine, etc.) from a patient can show some early evidence of bacterial
growth within 12 hours, but 24 to 48 hours (or more) are usually required
for positive identification.  The physician must therefore often decide in
the absence of complete information whether or not to start treatment and
what drugs to use if treatment is required.  Both of these may be difficult
questions.
	Figures {FIGNUM! CONSULTSTART} and {FIGNUM! CONSULTEND} show the
initial and final parts of a sample interaction between a physician and the
program (italicized comments at the right provide additional commentary but
are not part of the actual dialog).  {SYSTM MYCIN} in effect "interviews"
the doctor about his patient, collecting information that will allow it to
infer the diagnosis and select an appropriate therapy.
.FULLPAGEFIG;TURN ON "→";
--------PATIENT-1--------
1) Patient's name: (first-last)
**%3FRED SMITH %*                 	→%2α{Background data.α}%*
2) Sex:
**%3MALE %*
3) Age:
**%355 %*
4) Have you been able to obtain positive cultures from a site 
   at which Fred Smith has an infection?
**%3YES%*
  --------INFECTION-1--------
  5) What is the infection?
  **%3PRIMARY-BACTEREMIA %*
  6) Please give date and time when signs or symptoms of the
     primary-bacteremia (INFECTION-1) first appeared.
  **%3 MAY 5, 1977  18:00%*
  The most recent positive culture associated with the primary-
  bacteremia (INFECTION-1) will be referred to as:
    --------CULTURE-1--------
    7) From what site was the specimen for CULTURE-1 taken?
    **%3BLOOD %*
    8) Please give the date and time when this blood culture 
       (CULTURE-1) was obtained. (mo/da/yr)
    **%3 MAY 9, 1975 %*
    The first significant organism from this blood culture 
    (CULTURE-1) will be referred to as:
      --------ORGANISM-1--------
      9) Enter the identity  of ORGANISM-1:
      **%3UNKNOWN %*
→%2α{Typically the identity is not known and must be inferred.α}%*
      10) Is ORGANISM-1 a rod or coccus (etc.):
      **%3ROD %*
      11) The gram stain of ORGANISM-1:
      **%3GRAMNEG %*
      12) Have there been POSITIVE cultures from sites which
	  are NOT associated with infections about which you
	  are seeking advice?
      **%3NO%*
      13) Did ORGANISM-1 grow aerobically?
      **%3YES (.8)%*
              ...
.STARTCOM; INDENT 10,10,10;
{More questions follow, to gather sufficient information to infer
possible identities of the infecting organisms.}
.ENDCOM;
.FIG(The initial part of a consultation,CONSULTSTART:);
.ENDFIG;
.FULLPAGEFIG;TURN ON "→";
My therapy recommendation will be based on the following 
probable infection(s) and potential causative organism(s):

INFECTION-1 is PRIMARY-BACTEREMIA	→%2α{Possible identities.}%*
<1>  The identity of ORGANISM-1 may be PSEUDOMONAS-AERUGINOSA 
<2>  The identity of ORGANISM-1 may be KLEBSIELLA-PNEUMONIAE
<3>  The identity of ORGANISM-1 may be E.COLI
<4>  The identity of ORGANISM-1 may be BACTEROIDES-FRAGILIS
<5>  The identity of ORGANISM-1 may be ENTEROBACTER
<6>  The identity of ORGANISM-1 may be PROTEUS-NON-MIRABILIS

46) On a scale of 0 to 4 (higher numbers indicate increasing
    severity), how would you rate Fred Smith's degree of 
    sickness?
**%33%*
47) Does Fred Smith have a clinically significant allergic 
    reaction to any antimicrobial agent?
**%3NO %*
48) Fred Smith's weight in kilograms (or <number> POUNDS):
**%370%*
		→%2α{Note that 2 drugs cover for 6 potential organisms.α}%*
[Rec 1] My preferred therapy recommendation is as follows:
   In order to cover for Items <1 2 3 5 6>:
        Give: GENTAMICIN
        Dose: 119 mg (1.7 mg/kg) q8h IV [or IM] for 10 days
        Comments: Modify dose in renal failure
   In order to cover for Item <4>:
        Give: CLINDAMYCIN
        Dose: 595 mg (8.5 mg/kg) q6h IV [or IM] for 14 days
        Comments: If diarrhea or other GI symptoms develop, 
		  patient should be evaluated for possible 
		  pseudomembranous colitis.

Do you wish to see the next choice therapy?
**%3 YES%* 	→%2α{Alternative choices are available.α}%*
Which drug do you wish eliminated from consideration?
**%3CLINDAMYCIN%*
[Rec 2] Next best therapy recommendation:
   In order to cover for Items <2 3 4 5 6>:
        Give: CHLORAMPHENICOL
        Dose: 1 gm (15 mg/kg) q6h IV for 14 days
        Comments: Monitor patient's white count
   In order to cover for Item <1>:
        Give: GENTAMICIN
        Dose: 119 mg (1.7 mg/kg) q8h IV [or IM] for 10 days
        Comments: Modify dose in renal failure

.FIG(The final part of a consultation,CONSULTEND:);
.ENDFIG;
.SKIP TO COLUMN 1; SS(Teiresias:# System organization);
	As we have noted, && provides a number of tools designed to aid in the
construction, maintenance, and use of the knowledge base in the performance
program.  The overall structure of the system is shown in 
{YONFIG DETAILEDOVERLAY}, a more detailed version of {YONFIG OVERLAY}.  In
order to focus on the issues of knowledge acquisition, explanation,
etc., covered in later chapters, we will henceforth adopt the perspective
suggested in {YONFIG DETAILEDOVERLAY}
and consider the performance program as the simple entity
indicated there.    {SYSTM MYCIN} is of course more complex, but this abstraction
contains all the detail necessary for our purposes.
.ind strategies
.STARTFIG; BOXFIG; TURN OFF "%";

	 ⊂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂⊃
	 }         TEIRESIAS                                 }
	 }                            ⊂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂⊃  }
	 }                            }PERFORMANCE PROGRAM}  }
	 }  ⊂∂∂∂∂∂∂∂∂∂∂∂∂∂⊃           }   ⊂∂∂∂∂∂∂∂∂∂∂∂⊃   }  }
  E  ← ∂ ∂ ∂} EXPLANATION }← ∂ ∂ ∂ ∂ ∂}∂ ∂} INFERENCE }   }  }
	 }  }             }           }   }  ENGINE   }   }  }
  X      }  %∂∂∂∂∂∂∂∂∂∂∂∂∂$           }   %∂∂∂∂∂∂∂∂∂∂∂$   }  }
	 }                            }   ⊂∂∂∂∂∂∂∂∂∂∂∂⊃   }  }
  P      }  ⊂∂∂∂∂∂∂∂∂∂∂∂∂∂⊃           }   } KNOWLEDGE }   }  }
     ∂ ∂ ∂ →} KNOWLEDGE   }∂ ∂ ∂ ∂ ∂ ∂}∂ →}   BASE    }   }  }
  E      }  } ACQUISITION }           }   %∂∂∂∂∂∂∂∂∂∂∂$   }  }
	 }  %∂∂∂∂∂∂∂∂∂∂∂∂∂$           }         ↑         }  }
  R      }                            %∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂$  }
     ∂ ⊃ }                                      }            }
  T      }              ⊂∂∂∂∂∂∂∂∂∂∂∂∂⊃                       }
       % ∂ ∂ ∂ ∂ ∂ ∂ ∂ →} STRATEGIES }∂ ∂ ∂ ∂ ∂ $            }
	 }              %∂∂∂∂∂∂∂∂∂∂∂∂$                       }
	 %∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂$

.FIG(Teiresias:  System organization, DETAILEDOVERLAY:);
.ENDFIG;
.ind explanation
	&&'s %2explanation%* facility (see chapter 3) uses both the information in the
.ind knowledge base
.ind inference engine
knowledge base and an understanding of the design of the inference engine
to provide
the expert with explanations of the performance program's behavior.  The
.ind knowledge acquisition
%2knowledge acquisition%* facility (chapters 4-6) allows the expert to transfer
his knowledge of the field into the knowledge base, in order to increase the
performance program's competence.  Finally, the base of %2strategy knowledge%*
(chapter 7) provides a mechanism for expressing strategies concerning the use of
information in the knowledge base.∪∪Ideally, it should be possible for the expert
to add strategy knowledge to the system using the knowledge acquisition
facility, but this has not as yet been implemented.  See chapter 7 for details.∪
	&& is written in {PRLANG INTERLISP}, an advanced dialect of the 
{PRLANG LISP} language, and runs on a DEC PDP-10 under Tenex.  The knowledge acquisition
program occupies approximately 40,000 (36-bit) words; the explanation
program, 10,000; and the strategy knowledge base, 5,000.
.beginind knowledge base
.SSS(Performance program:# Knowledge base organization);
.BEGINSMALLQUOTE;TURN ON "→";INDENT 0,0,0;
→In time you will know all with certainty.
.ENDSMALLQUOTE (|%2Oedipus the King%*, line 613|);
	&& is designed to deal with knowledge encoded in the form of inference
.ind inference rules
rules of the sort shown in {YONFIG RULE}.  The rules are stored internally in the
{PRLANG INTERLISP} form shown, from which
the English version is generated.  Each
rule is a single "chunk" of domain-specific information indicating an
%2action%* (in this case a conclusion) that is justified if the conditions
specified in the %2premise%* are fulfilled.
.STARTFIG;
RULE050
-------
.SKIP 1;
If  1) the infection is primary-bacteremia, and
    2) the site of the culture is one of the sterile sites, and
    3) the suspected portal of entry of the organism is the 
       gastrointestinal tract,
then there is suggestive evidence (.7) that the identity of the 
     organism is bacteroides.


PREMISE	   ($AND (SAME CNTXT INFECT PRIMARY-BACTEREMIA)
	         (MEMBF CNTXT SITE STERILESITES)
	         (SAME CNTXT PORTAL GI))
ACTION	   (CONCLUDE CNTXT IDENT BACTEROIDES TALLY .7)
.FIG(An inference rule,RULE:);
.ENDFIG;CONTINUE;
Note that the rules are judgmental, that is, they make inexact inferences.  In the case of
the rule in {YONFIG RULE}, for instance, the evidence cited in the premise is
enough to assert the conclusion shown with a mild degree of confidence:# 
 .7 out of 1.0.  This  number is called a "certainty factor," or CF, and
embodies a model of
confirmation described in [[Shortliffe75b].  The details of this model need not
concern us here; we need only note that rules in this case
are typically inexact inferences,
rather than statements made with certainty.∪∪$AND (the multivalued analogue of the Boolean AND)
performs a minimization operation; $OR (similar) does a maximization.  Note
that, unlike standard probability theory, $AND does not involve any
multiplication over its arguments.  Since CFs are not probabilities, there is no
a#priori reason why a product should be a reasonable number. There is,
moreover, a long-standing convention in work with multivalued logics which
interprets AND as %2min%* and OR as %2max%* [[Lukasciewicz70]. It is based
primarily on intuitive grounds:# If a conclusion requires all of its antecedents
to be true, then it is a relatively safe and conservative strategy to use the
smallest of the antecedent values as the value of the premise.  Similarly, if
any one of the antecedent clauses justifies the conclusion, it is safe to take
the maximum value.∪
	The premise of each rule is a Boolean combination
of one or more %2clauses%*, each of which is constructed from a %2predicate
.ind associative triple
function%* with an %2associative triple%* (%2attribute, object, value%*) as
its argument.  Thus each clause of a premise has the following four
components:
.STARTCENTERIT(5);
<predicate function>    <object>    <attribute>   <value>
.ENDCENTERIT;
For the third clause in the premise of {YONFIG RULE}, for example, the predicate
function is %ASAME%*, and the triple is "%2portal-of-entry%* of %2organism%*
is %2GI-tract%*."# (%ACNTXT%* is a free variable which is bound to the specific 
object [also called a "context"] for which the rule is invoked.)#
There is a standardized set of some 24 domain-independent predicate functions
(e.g., %ASAME, KNOWN, DEFINITE%*) and a range of domain-specific attributes
(e.g., %AIDENTITY, SITE%*), objects (e.g., %AORGANISM, CULTURE%*), and
associated values (e.g., %AE.COLI, BLOOD%*).  These form a "vocabulary" of
conceptual primitives available for use in constructing rules.
	A rule premise is always a conjunction of clauses, but it may contain
arbitrarily complex conjunctions or disjunctions nested within each clause.
(Instead of writing rules whose premise would be a disjunction of clauses, a
separate rule is written for each clause.)  The action part indicates one or
more conclusions that can be drawn if the premises are satisfied, making the
rules purely inferential.
	Each rule is intended to embody a single, independent chunk of
knowledge and  states all necessary information explicitly in the premise.  Since
the rule uses a vocabulary of concepts common to the domain, it forms,
by itself, a comprehensible statement of some piece of domain knowledge.  As will
become clear, this characteristic is useful in many ways. 
	Each rule is highly stylized, with the if/then format and the specified
set of available primitives.  While the {PRLANG LISP} form of each is executable
code (the premise, in fact, is simply EVALuated by {PRLANG LISP} to test its
truth; and the action EVALuated to make its conclusions), this tightly
structured form makes it possible to examine the rules as well as execute them.
This, in turn, leads to some important capabilities, described below.  As
one  example, the internal form (i.e., {PRLANG LISP})  can be
translated into the English form shown in {YONFIG RULE}.
	Facts about the world ({YONFIG FTUPLES}) are represented as 4-tuples made
up of an 
associative triple and its current CF.  Positive CFs indicate a predominance of
evidence confirming a hypothesis; negative CFs indicate predominance of
disconfirming evidence.
.STARTFIG;
		(IDENT ORGANISM-2 KLEBSIELLA .25)
		(IDENT ORGANISM-2 E.COLI .83)
		(SENSITIVS ORGANISM-1 PENICILLIN -1.0)
.FIG (Examples of representation of facts, FTUPLES:);
.ENDFIG;
.continue
Note that the model of inexact logic permits the coexistence of several
plausible values for a single attribute, if suggested by the evidence.
Thus, for example, after attempting to deduce the identity (%AIDENT%*) of an
organism, {SYSTM MYCIN} may have concluded (correctly) that there is
evidence both for E.coli and for Klebsiella.
	There are thus two major forms of knowledge representation in use
in the performance program:  (%2i%*) the attributes, objects, and
values--which form a vocabulary of domain-specific conceptual primitives,
and (%2ii%*) the inference rules expressed in terms of these primitives.
There are, correspondingly, two forms of knowledge acquisition:# (%2i%*)
the acquisition of new primitives--to expand the performance program's
vocabulary of concepts, and (%2ii%*) the acquisition of new rules expressed
in terms of existing primitives.  && makes possible both of these; chapter
6 deals with the first, and chapter 5 describes the second.

.endind knowledge base
.SSS(Performance program:# The inference engine,BKGNDCS:);
.BEGINSMALLQUOTE;TURN ON "→";INDENT 0,0,0;
→Know that I have gone many ways wandering in thought.
.ENDSMALLQUOTE(|%2Oedipus the King%*, lines 66-67|);
.ind backward chaining
.beginind inference engine
.ind inference rules
.ind goal tree
	The rules are invoked in a simple backward-chaining fashion that
produces an exhaustive depth-first search of an and/or goal tree.  Assume
that the program is attempting to determine the identity of an infecting
organism.  It retrieves all the rules that make a conclusion about that
topic (i.e., they mention %AIDENT%* in their action) and invokes each one
in turn, evaluating each premise to see if the conditions specified have
been met.  For the rule in {YONFIG RULE}, this process would begin with
determining the type of the infection.  This, in turn, is set up as a
subgoal and the process recurs.
	The search is thus depth-first (because each premise condition is
thoroughly explored in turn); the tree that is sprouted is an and/or goal
tree (because rules may have OR conditions in their premise); and the
search is exhaustive (because the rules are inexact; so that even if one
succeeds, it was deemed to be a wisely conservative strategy to continue to
collect all evidence about the subgoal.)
.ind subgoal
	Note that the subgoal that is set up is a generalized form of the
original goal. Thus, for the first clause in {YONFIG RULE} ("the infection
is primary-bacteremia"), the subgoal set up is "determine the type of
infection."# The subgoal is therefore always of the form "determine the
value of <attribute>" rather than "determine whether the <attribute> is
equal to <value>."# By setting up the generalized goal of collecting all
evidence about an attribute, the performance program effectively exhausts
each subject as it is encountered, and thus tends to group together all
questions about a given topic.  This results in a system that displays a
much more focused, methodical approach to the task, which is a distinct
advantage where human engineering considerations are important.  The cost
is the effort of deducing or collecting information that is not strictly
necessary.  However, since this occurs rarely--only when the <attribute>
can be deduced with certainty to be the <value> named in the original
goal--it has not proven to be a problem in practice.
	If after trying all relevant rules (referred to as "tracing" the
subgoal), the total weight of the
evidence about a hypothesis falls between -.2 and .2 (an empirical
threshold), the answer is regarded as still unknown.  This may happen if no
rules are applicable, if the applicable rules are too weak, if the
effects of several rules offset each other, or if there are no rules for
this subgoal at all.  In any of these cases, when the system is unable to
deduce the answer, it asks the user for the value of the subgoal (using a
phrase that is stored along with the attribute itself).
	The strategy of always attempting to deduce the value of a subgoal
and asking the user only when deduction fails, insures a minimum number of
questions.  It would also mean, however, that work might be expended
searching for a subgoal, arriving perhaps at a less than definite answer,
when the user might already know the answer with certainty.  To prevent
this inefficiency, some of the attributes have been labeled "laboratory
data," to indicate that they represent information available to the
physician as results of quantitative tests.  In these cases the
deduce-then-ask procedure is reversed and the system will attempt to
deduce the answer only if the user cannot supply it.  Given the desire to
.ind tree search
minimize both tree search and the number of questions asked, there is no
guaranteed optimal solution to the problem of deciding when to ask for
information and when to try to deduce it.  But the distinction described
has performed quite well and seems to embody a very appropriate criterion.
	Two other additions to straightforward tree search increase the
inference engine's efficiency.  First, before the entire list of rules for
a subgoal is retrieved, the program attempts to find a sequence of rules
that would establish the goal with certainty, based only on what is
currently known.  Since this is a search for a sequence of rules with CF=1,
.ind unity path 
the result is termed a %2unity path%*.  Besides efficiency considerations,
this process offers the advantage of allowing the program to make "common
sense" deductions with a minimum of effort (rules with CF=1 are largely
definitional).  Because there are few such rules in the system, the search
is typically very brief.
.ind partial evaluation
	Second, the inference engine performs a partial evaluation of rule
premises.  Since many attributes are found in several rules, the value of
one clause (perhaps the last) in a premise may already have been
established while the rest are still unknown.  If this clause alone would
make the premise false, there is clearly no reason to do all the search
necessary to establish the others. Each premise is thus "previewed" by
evaluating it on the basis of currently available information. This
produces a Boolean combination of TRUEs, FALSEs, and UNKNOWNs;
straightforward simplification (e.g., %AF#∧#U#≡#F%*) indicates whether the
rule is guaranteed to fail.  This procedure is examined in more detail in
{YON2 BKGNDRHLL}.
	The final aspect of the control structure is the tree of objects
(or contexts) that is constructed dynamically from a fixed hierarchy as the
consultation proceeds ({YONFIG OBJECTTREE}).  This tree serves several
purposes.  First, bindings of free variables in a rule are established by
the context in which the rule is invoked, with the standard access to
contexts that are its ancestors.  Second, since this tree is used to
represent the relationships of objects in the domain, it helps structure
the consultation in ways already familiar to the user.  For example, in the
medical domain, a patient has one or more infections, each of which may
have one or more associated cultures, each of which in turn may have one or
more organisms growing in it, and so on.
.STARTFIG;
.boxfig;
			    PATIENT-1

				}
				}
				}

			   INFECTION-1

			      /  \
			     /    \ 
			    /      \
			   /        \

	 PREVIOUS-CULTURE-1          CULTURE-1

	      }                      /    \
	      }                     /      \
	      }                    /        \
	      }                   /          \

     PREVIOUS-ORGANISM-1     ORGANISM-1     ORGANISM-2

.FIG(A sample tree of objects,OBJECTTREE:);
.ENDFIG;
.endind inference engine
.SSS(Domain independence and range of application);
.ind domain independence
	The fundamental design and implementation of the system in 
{YONFIG DETAILEDOVERLAY} does not restrict its use to medical domains.
This is due primarily to the modularization suggested in that figure and to
the fact that && is oriented solely around the concepts of rules,
attribute-object-value triples, etc.
	The clear distinction between the inference engine and the
knowledge base, for instance, makes it possible to remove one knowledge
base from the performance program and replace it with another.  It has
proven possible, for instance, to build separate knowledge bases for such
disparate areas as auto mechanics and chemotherapy for psychiatry.  In
the first such effort ([[vanMelle74]), a small part of an auto repair
manual was rewritten as production rules and inserted in place of the
bacteremia knowledge base.  What resulted was a very simple but fully
functional consultant capable of diagnosing and suggesting remedies for
problems in parts of an auto electrical system.  More recently, a pilot
system for psychiatric diagnosis and chemotherapy was assembled.  While
this program had only 50 rules, it too was fully functional and displayed
primitive but encouraging performance.  In both systems, all of the
established explanation facilities worked as designed, without
modification.
	There are, naturally, some domains that might be less profitable to
explore.  One of the interesting lessons of the auto repair system was that
domains with little inexactness in the reasoning process--those for which
algorithmic diagnostic routines can be written--are not particularly
appropriate for this methodology.  The precision in these domains means
that little use is made of the certainty factor mechanism, and many of the
more complicated (and computationally expensive) features go unused.  The
effect would be akin to swatting a fly with a large doctoral thesis--all
that work and weight are unnecessary when something far simpler would do.
.ind inference rules
	Nor is it reasonable to expect to be able to write inference rules
built from attribute-object-value triples for an arbitrary domain.  As
knowledge in an area accumulates, it becomes progressively more formalized.
There is a certain stage in this formalization process when it is
appropriate to use rules of the sort shown above.  Earlier than this
the knowledge is too unstructured; later on it may (like the auto repair
system) be more effective to write straightforward algorithms.
	It is also possible that the knowledge in some domains is
inherently unsuited to a rule-like representation, since rules become
increasingly awkward as the number of premise clauses increases.  Dealing
with a number of interacting factors may be difficult for any
representation, but given the reliance here on rules as a medium of
communication of knowledge, the problem becomes especially significant.
	Finally, the current performance program makes extensive use of the
.ind associative triple
attribute-object-value associative triple.  This is perhaps the most
limiting factor, since this kind of representation is effective only in
simpler domains.  It is also, however, a far more implementation-dependent
factor than the other two mentioned.  While it would be difficult, the
triples could be replaced by a more powerful representation scheme without
adversely affecting the feasibility of the techniques for knowledge
acquisition and use that are described in subsequent chapters.
.SS(Production rules,PRODRULES:);
	While many of the ideas on which && is based are applicable to a range
.ind knowledge representation
of knowledge representations, its design and implementation have been strongly
influenced by the use of a rule-based encoding of knowledge.  This section
explores production rules from several perspectives, to provide a basis for some
of the issues discussed in later chapters.

.SSS(Production rules in general);
.beginind production rules
	Production rules have been the subject of much work since their
introduction by Post ([[Post43]) as a general computational mechanism.
Across the wide range of variations and techniques explored, there appears
to be a set of fundamental characteristics intrinsic to the methodology. (An
extended discussion of these characteristics can be found in [[Davis77a].)
	The present discussion will be concerned with
two problems  typically encountered in using production rules.  They are
described briefly here, so that their manifestations will be evident in
later discussions of system performance.  They are also described in terms which
should make it clear that these problems are inherent in the production system
methodology itself. This will enable the reader to distinguish them from
shortcomings that may have arisen from our approach to various uses of meta-level
knowledge. Thus, while && displays certain limitations,
some of these are a result of the approach, while others are a legacy of the
particular representation of knowledge chosen early in the development of the
performance program.
	One problem is the limit on the amount of knowledge
that can be expressed conveniently in a single rule.  Actions that are
"larger" than this limit are often achieved by the combined effects of several
rules.  For many reasons (see [[Davis77a], section 5), this is difficult to
do and often produces opaque results.  We will see that this is a significant
consideration when judging  the utility of production rules as a knowledge
representation.
.ind implicit context
	Second, there is what has been labeled the implicit context problem.
Perhaps the simplest example can be found in a production-rule-based system
that uses a sequential left-hand side (LHS) scan--that is, it starts out
at the beginning of the rule set and searches sequentially through it, examining
the LHS of each rule until one is found that meets the desired criteria.  Suppose
the LHS of the first three rules are:
.STARTFIG;
R1:			A ∧ B ∧ C  %5==@   α#α#α#%*
R2:				D  %5==@   α#α#α#%*
R3:			    E ∧ F  %5==@   α#α#α#%*
.ENDFIG; continue;
R3 won't  be tested unless D is false and either A, B, or C is
false. The rule thus, effectively, has two extra, implicit clauses of the form
%A(not D)%* and %A((not A) ∨ (not B) ∨ (not C))%*, simply because of its
location in the rule set.  Note that the location is not %2necessarily%*
critical--the entire thrust of the rule could possibly be summed up in the
presence of just E and F.  But often the location %2is%* an important element
and the implicit clauses are significant.∪∪Indeed, [[Waterman70] uses this idea to
great advantage.∪
	It is tempting to make use of implicit context to take advantage of the resulting
conciseness--consider how  short R3 is and how long an explicit
statement of the 99th rule might be if it depended on the failure of the first
98.
	While the point has been illustrated via an ordered rule set, the same
problem can arise from other causes. {SYSTM MYCIN}'s rules are not
ordered, but they are classified according to the object to which they
apply and similar problems arise from this.
	The essential point here is twofold:# First, %2any%* piece of structuring
in the system carries information.  Second, this information can be implicitly
passed on to the rules in subtle ways. We will see examples of this in
discussing the organization of meta-rules in chapter 7.

.SSS(Production rules as a knowledge representation,);
	&&'s first goal--putting a domain expert in direct communication
with a high performance program--requires a common language of
communication.  This is especially important for the specification of
knowledge that the expert wants to add to the system. He must be able to
express it in a form that is the same as (or transformable into) the one
used internally by the program.  Thus, the ease of establishing the link
between expert and program is strongly affected by the knowledge
representation chosen.
.ind knowledge representation
	While we cannot offer formal arguments, there are several reasons
why production rules of the sort shown in {YONFIG RULE} are a useful
representation.  First, the general task of deduction is one that fits
quite well into the situation/action character of production rules.  There
is therefore less transformation necessary between the knowledge as
expressed by the expert and its final encoding inside the system.
	Next, the rules are, by themselves, comprehensible chunks of
knowledge, since they carry in their premise a full specification of their
applicability.  Their independence also facilitates the incremental growth
of the knowledge base:##Rules can be added one by one, and performance
improves with each addition.
	Rules are also retrieved and invoked on the basis of the content of
their right-hand side (rather than on the basis of their name) and never
reference one another.  As a result, adding (or changing) a rule is a far
easier task than would be the case if there were extensive references to
the rule's name in many places.  Consider the difficulty, by contrast, of
editing a procedure in an {PRLANG ALGOL}-like language and then having to
go back and find all the places it is called to see if those calls are
still appropriate.
	The rules also seem to capture a "chunk" of knowledge of the
appropriate size. A significant amount of knowledge of bacteremia diagnosis
and therapy has been encoded in rules that have between two and five
premise clauses and one or two actions.  While all domains may not offer
such convenient "bite-size" chunks, the success of the systems in the other
two domains that have been explored is encouraging.
.ind backward chaining
	The control structure--backward chaining--also appears to be
reasonably intuitive.  While such %2modus ponens%* reasoning is not often
recognized explicitly, it is common cognitive behavior and should therefore
not be alien to the expert.
	Finally, the rules are, for the most part, what may be labeled a
"single level" mechanism.  They are composed of elements that are
conceptual primitives and require no further decomposition to be
understood.  It is clear to the user, for instance, what is meant by "the
site is X," or "conclude that the identity is Y."# Compare this with the
difficulty that would arise if the action of a rule were the invocation of
a deep and complex calculation.
	As a result of all of these factors, production rules of the sort
shown above have proven to be an effective and useful representation.

.ind knowledge organization
.SSS(Impact on knowledge organization);
.ind procedural encoding
	Production rules also contribute their own perspective on the fundamental
organization of knowledge in a program, a perspective quite different from that
associated with procedure-oriented representations. Production rules tend to
de-emphasize the hierarchical control structure natural to procedural
languages
and substitute a single, uniform collection of knowledge "chunks" (the rules).
Since each rule is retrieved on the basis of its contents (as the rule in
{YONFIG RULE}
is retrieved on the basis of its conclusion), no rule is ever called
directly, in the style of procedures.  Thus, the addition (or deletion) of a rule
does not require the modification of any other rule to provide for (or
delete) a
call to it.  The result is a program whose knowledge base is easily modified
and whose behavior is relatively stable in the face of those changes.
	This stability might be demonstrated by repeatedly
removing rules from a production-rule-based program.  Many such systems will
continue to display some sort of "reasonable" behavior, up to a point.  By
contrast, adding a procedure to an {PRLANG ALGOL}-like program requires
modification of other parts of the code to insure that it is invoked, while
removing an arbitrary procedure from such a program generally cripples it.
	Note that the issue here is more than simply the "undefined function"
error message that would result from a missing procedure. The problem persists
even if the compiler or interpreter is altered to treat undefined functions as
no-ops. The issue is a much more fundamental one concerning organization of
knowledge:# Programs written in procedure-oriented languages stress the kind of
explicit passing of control from one section of code to another that is
characterized by the calling of procedures.  This is typically done at
a selected time and in a particular context--both carefully chosen by the
programmer.  If a no-op is substituted for a missing procedure, the
environment upon
return will not be what the programmer expected, and subsequent procedure
calls will be executed in increasingly incorrect environments. Similarly,
procedures that have been added must be called from %2somewhere%* in the
program, but the location of the call must be chosen carefully if the effect is
to be meaningful.
	Production systems, on the other hand,
emphasize the decoupling of control flow from the writing of rules.  Each rule
is designed to be a modular chunk of knowledge with its own statement of
relevance.  Thus, where the {PRLANG ALGOL} programmer carefully chooses the
order of procedure calls to create a selected sequence of environments, in a
production system it is the environment which chooses the next rule for
execution.  And since a rule can only be chosen if its criteria of relevance have
been met, the choice will continue to be a plausible one, and system behavior
will remain "reasonable," even as rules are successively deleted.

.ind high-level language
.SSS(Production rules as a high-level language,BKGNDRHLL:);
	As noted, the inference rules are composed of clauses made up of
predicate functions and attribute-object-value triples.  The entire collection
of these elements forms a set of conceptual primitives for any given domain.
Thus in dealing with infectious disease, for example, there are %Acultures%* with
%Asites%* like %Ablood%*.
	If we consider pushing this back one level, it becomes plausible to consider
the concepts %2predicate function, attribute, object,%* and%2 value%* as
conceptual primitives in the more general domain of knowledge
representation.  We
consider each of them  an indication of a whole class of
objects, with individual instances supplied by the domain of application.  This
.ind data types
suggests treating them as extended data types, which is a useful
analogy to keep in mind.  There are 13 such "data types," used
as a set of conceptual primitives for expressing knowledge in rule form.
They are the data types of what is, in effect, a high-level programming
language--one whose syntax is very restricted and whose sole statement
type is a rule.  Since
we refer to them often in what follows, they are listed and described
below, for reference:
.STARTFIG; INDENT 0,0,0
predicate function
attribute              (abbreviated as ATTRIB) 
object                 (for historical reasons, also called  
			a "context," abbreviated as CNTXT)
value 
certainty factor       (a real number between -1 and 1, often
		        abbreviated CF)
list                   (the standard {PRLANG LISP} concept)
atom                   (the standard {PRLANG LISP} concept)
string                 (the standard {PRLANG LISP} concept)
slotname               (discussed in chapter 6)
slotexpert             (discussed in chapter 6)
blank                  (discussed in chapter 6)
advice                 (discussed in chapter 6)
table                  (discussed in chapter 6)
.ENDFIG;
	This concept, of a high-level language with a restricted syntax and
a small number of data types, forms an important base underlying many of the
the capabilities of &&.  Perhaps the most fundamental of these is the
ability to make multiple uses of a single body of knowledge.  Because &&
can both assemble and dissect rules, we have a system that can not only use
its knowledge of a domain directly, but can also examine it, alter it,
abstract it, and draw conclusions about it.  The rules are thus, at
different times, both code and data and are used in both capacities almost
equally.  A large number of interesting and useful features follow from
this; they are explored in subsequent chapters.  To clarify this point,
however, we offer a simple but illustrative example.

	As indicated earlier in our discussion of the control structure, before
.ind partial evaluation
invoking a rule the system performs a partial evaluation of its premise to make
sure that the rule is not already guaranteed to fail.  But performing this evaluation
is nontrivial.  The system requires a way to tell if any clause in the premise
is known to be false.  It cannot simply EVALuate each clause individually, since a
subgoal that had never been traced before would send the system off on its
recursive search.
	However, if the system can establish which attribute is used in
the clause, it is possible to determine  whether this attribute
has been traced previously
(by reference to internal flags).  If so, the clause can be EVALuated to obtain
the value.
	This process is made possible by a %ATEMPLATE%* associated with each predicate
function.  It describes the format of any call of that predicate function, by
giving the generic type and order of the arguments to the function. It thus
resembles a simplified procedure declaration.
.STARTFIG;
Function           Template             Sample function call

 SAME	   (SAME CNTXT ATTRIB VALUE)   (SAME CNTXT SITE BLOOD)

.FIG(Example of a function template);
.ENDFIG;CONTINUE;
The template is not itself a piece of code but is simply a list structure
of the sort shown above, indicating the appearance of an interpreted call
to the predicate function.  Since rules are kept in interpreted form, as
shown in {YONFIG RULE}, the template can be used as a guide to dissect a
rule.  For each clause in a rule, && retrieves the template associated with
the predicate function found in that clause (i.e., the template associated
with the %ACAR%* of the clause)  and uses it to guide the examination of
the clause.  In the case of the function %ASAME%*, for instance, the
template indicates that the attribute (%AATTRIB%*) is the third element of
the list structure that comprises the function call.  The previewing
mechanism uses the templates to extract the attribute from the clause in
question and can then determine whether or not it has been traced.
	There are two points of interest here:
.BEGINLIST;
	(a)\Part of the system is examining the code (the rules) being
executed by another part, and
	(b)\this examination is guided by the information carried in components of the rules
themselves. 
.ENDLIST;
The ability to examine the code could have been accomplished by requiring
all predicate functions to use the same format, but this is obviously
awkward.  Allowing each function to describe the format of its own calls
permits code to be stylized without being constrained to a single form, and
.ind flexibility
hence is flexible and much easier to use.  This approach requires only that
each form be expressible in a template built from the current set of
conceptual primitives.  It also insures that the capability will persist in
the face of future additions to the system. The result is one example of
the general idea of giving the system access to, and an "understanding" of,
its own representations.  Additional examples of this concept are spread
throughout the remainder of this work.
.endind production rules
.SS(Levels of knowledge,LEVELSOFK:);
.ind meta-level knowledge
	The concept of meta-level knowledge introduced in chapter 1 can be
defined
more generally as multiple levels of knowledge.∪∪There
have been other uses of the term "levels of knowledge," most
notably to describe the hierarchy of domain knowledge in the {SYSTM HEARSAY II}
system [[Lesser74] (e.g., phoneme, word, syntax, semantics, etc.).  We use the term here in a
different sense, that of a hierarchy of %2types%* of knowledge, and intend that
meaning throughout.∪  This idea has several
important applications in the work reported here.  In chapter 7, 
.ind strategies
strategies are defined in terms of a knowledge hierarchy with an arbitrary
number of levels.  A different sort of hierarchy is responsible for much
.ind knowledge acquisition
of the performance and generality of &&'s knowledge acquisition
routines.  In this hierarchy, knowledge is stratified into three distinct levels.  A brief
description of it here will help to clarify many of the ideas involved
and illustrate their power.
	The first level of the hierarchy contains the
object-level knowledge--medical knowledge of cultures,
organisms, drugs, etc.  High performance on the task of diagnosis and therapy
selection is supported by an extensive collection of knowledge about objects in
the medical domain. This first level is naturally limited to this single domain.
.ind knowledge representation
	The next level is concerned with the conceptual building blocks
of the knowledge representation--the predicate functions, attributes, values,
rules, and so on.  Performance on the task of knowledge acquisition is dependent
upon an extensive body of knowledge about these building blocks.  That is, there
is assembled at this level a large amount of knowledge about the representational
primitives.  As will become clear in chapters 5 and  6, the system has an
"understanding" of what an attribute is, what roles it plays, etc.
Since no reference is made at this level to any specific instance of any of the
primitives, this level of knowledge has a degree of domain independence.  Over
the range of domains in which knowledge can be represented in terms of these
primitives, the knowledge acquisition routines are similarly domain
independent.
	Knowledge at the third level is concerned with the conceptual
primitives behind representations in general.  To make this clearer, consider
the recursion of ideas:# To aid the construction of a high-performance
(object-level) program, we build a (meta-level) system that can acquire
object-level
knowledge.  Its performance at this task is based on an extensive store of
knowledge about specific representations.  %2But it in turn is "just" another
knowledge-based system.%* By supplying the proper set of
representation-independent primitives (and a store of knowledge about them),
we can use
precisely the same formalism (indeed, the same code) to provide a system for
acquiring knowledge about individual representations.∪∪The 
details and examples of this bootstrapping process are given in chapter 6.∪
	In this way, the second-order system can be used to acquire knowledge
about a representation.  This, in turn, becomes the knowledge base for the
meta-level
system, which then facilitates the construction of the knowledge base for
the object-level performance program.  The two stages of this process are shown
below.
.STARTFIG;BOXFIG; TURN OFF "%";

            teaching about        teaching about the
           a representation      domain of application
⊂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂⊃      ⊂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂⊃       ⊂∂∂∂∂∂∂∂∂∂∂∂∂∂∂⊃
}knowledge of   }      }knowledge of   }       }              }
}representation-}      }primitives for }       } object-level }
}independent    }= = @ }a specific     } = = @ }knowledge base}
}primitives     }      }representation }       }              }
%∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂$      %∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂$       %∂∂∂∂∂∂∂∂∂∂∂∂∂∂$

   SYSTEM 2                SYSTEM 1                SYSTEM 0

.FIG(The conceptual process);
.ENDFIG;
.continue
Note, however, that while this indicates the process as it appears
conceptually, a more accurate system view is shown below.  Here we have
only two systems, because there is in fact only a single higher level
system.  This is possible because the process of teaching about a
%2representation%* can be made computationally identical to the process of
teaching about specific %2instances%* of that representation.  The two are
therefore accomplished with precisely the same mechanism.
.STARTFIG;BOXFIG;
			 teaching about the
			domain of application
	  ⊂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂⊃               ⊂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂⊃
  ⊂∂∂∂∂`≥ }  meta-level   }               }  object-level  }
  } ⊂∂∂≤' }    system     }    = = @      }     system     }
  } }     α%∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂$               α%∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂$
  } }            } }
  } }		 } }
  } α%∂∂∂∂∂∂∂∂∂∂∂∂$ }
  α%∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂∂$
    teaching about
   a representation
.FIG(The computational process);
.ENDFIG;
	In &&, the recursive application of the knowledge acquisition
system offers a certain degree of representation independence; its extent
and limitations are examined in chapter 6.
	We began this report by noting that the original search for
.ind generality
generality--set in the domain of problem-solving methods--has proved
unsuccessful thus far.  Knowledge-based methods have been suggested as an
alternative, but each has a sharply limited range of application, and the
lure of generality remains.  Is there another way to salvage it?
	One of the underlying themes of this work is the attempt to capture
.ind knowledge acquisition
a different form of generality, one that has its source in knowledge
acquisition rather than in problem-solving methods.  That is, if programs
require large stores of knowledge for performance, then can we not take a
step back and discover powerful, broadly applicable techniques for
accomplishing the transfer of knowledge from expert to program?  The
resulting man-machine combination would be a semi-automatic system, whose
generality arose from access to the appropriate human experts and whose
power was based on the store of knowledge it acquired from them.  The next
five chapters discuss several steps toward the realization of this aim.
β